Skip to content

Unify multiplier-bootstrap RNG: drop Rust weight helper, numpy canonical#362

Closed
igerber wants to merge 3 commits intomainfrom
fix/bootstrap-weight-rng-parity
Closed

Unify multiplier-bootstrap RNG: drop Rust weight helper, numpy canonical#362
igerber wants to merge 3 commits intomainfrom
fix/bootstrap-weight-rng-parity

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 24, 2026

Summary

  • Closes a cross-backend silent failure: rust/src/bootstrap.rs::generate_bootstrap_weights_batch seeded Xoshiro256PlusPlus::seed_from_u64(seed + i) per row while diff_diff.bootstrap_utils.generate_bootstrap_weights_batch_numpy consumed numpy.random.default_rng (PCG64). Same seed → different Rademacher/Mammen/Webb weights depending on whether the Rust backend was compiled in, and therefore different bootstrap SE / CI / p-values in CallawaySantAnna, ContinuousDiD, ImputationDiD, TwoStageDiD, ChaisemartinDHaultfoeuille (multi-horizon PSU + group-level + PSU-broadcast multiplier bootstrap), and EfficientDiD (cluster + unit multiplier bootstrap). No warning fired.
  • Fix: delete the Rust helper entirely (plus the now-orphaned rand + rand_xoshiro Cargo deps) and collapse the Python shim to call the existing numpy implementation directly. Unlike the TROP bootstrap fix (PR Fix TROP bootstrap SE backend divergence under fixed seed #354) the Rust helper had no post-RNG work, so threading pre-generated weights through PyO3 would have left a pointless pass-through — same shape as the compute_synthetic_weights delete-and-inline from PRs Delete compute_synthetic_weights shim; inline Frank-Wolfe in rank_control_units #344/Remove dead Rust compute_synthetic_weights (follow-up to PR #344) #345.
  • SDID and TROP are not affected: SDID's bootstrap is pairs + Rao-Wu + weighted Frank-Wolfe (no multiplier weights); TROP's bootstrap uses stratified_bootstrap_indices. Neither imports the deleted helper.
  • User-visible effect: users running with the Rust backend compiled in will see different-but-equally-valid multiplier-bootstrap SE / CI / p-values under the same seed after this change (numpy PCG64 replaces Rust Xoshiro). Pure-Python-backend users are unaffected. Sampling distributions are unchanged.
  • Distributional coverage (shape, support, moments) moves from the deleted Rust tests to a new TestBootstrapWeightsBatchDistributions class in tests/test_bootstrap_utils.py — 11 tests including byte-level default_rng(42) seed pins for each of Rademacher, Mammen, Webb. New test_rust_bootstrap_weights_batch_is_removed regression guard in tests/test_rust_backend.py mirrors the existing test_compute_synthetic_weights_is_removed pattern.

Net diff: +164 / -434 across 10 files. Closes the silent-failures audit TODO row for the Rust multiplier-weight RNG.

Test plan

  • maturin develop --release builds cleanly with rand / rand_xoshiro dropped from rust/Cargo.toml.
  • from diff_diff._rust_backend import generate_bootstrap_weights_batch raises ImportError.
  • tests/test_bootstrap_utils.py green (36 tests including the new 11-test distributional class with seed-pinned baselines).
  • tests/test_rust_backend.py green (86 tests including both removal regression guards).
  • 743 tests across the 6 affected estimators green on Rust backend: test_staggered.py, test_continuous_did.py, test_imputation.py, test_two_stage.py, test_efficient_did.py, test_chaisemartin_dhaultfoeuille.py.
  • Same 6 estimators' bootstrap tests green under DIFF_DIFF_BACKEND=python.

🤖 Generated with Claude Code

Closes the silent-failures audit follow-up for the Rust multiplier-weight
RNG. `rust/src/bootstrap.rs::generate_bootstrap_weights_batch` seeded
`Xoshiro256PlusPlus::seed_from_u64(seed + i)` per row while
`diff_diff.bootstrap_utils.generate_bootstrap_weights_batch_numpy` ran
under `numpy.random.default_rng` (PCG64), so the same `seed` produced
different Rademacher/Mammen/Webb weights depending on whether the Rust
backend was compiled in — and therefore different multiplier-bootstrap
SE / CI / p-values in CallawaySantAnna, ContinuousDiD, ImputationDiD,
TwoStageDiD, ChaisemartinDHaultfoeuille, and EfficientDiD.

Unlike the TROP bootstrap fix (PR #354), this Rust helper did nothing
after generating weights, so threading pre-generated weights through
PyO3 would have left the function as a pointless pass-through.
Deleted the `rust/src/bootstrap.rs` module entirely together with the
orphaned `rand` + `rand_xoshiro` Cargo deps; the Python shim now calls
the numpy implementation directly (with `_numpy` kept as an alias for
backward compatibility). The three distributional property tests move
to `tests/test_bootstrap_utils.py::TestBootstrapWeightsBatchDistributions`
including byte-level seed-pinned baselines for each distribution.
`test_rust_bootstrap_weights_batch_is_removed` mirrors the existing
`test_compute_synthetic_weights_is_removed` pattern as a regression
guard against accidental re-export.

Users running with the Rust backend compiled in will see
different-but-equally-valid bootstrap SE / CI / p-values under the
same `seed` (PCG64 replaces Xoshiro); pure-Python users are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

  • The changed methodology surface is the shared multiplier-bootstrap weight generator used by CallawaySantAnna, ContinuousDiD, ImputationDiD, TwoStageDiD, EfficientDiD, and ChaisemartinDHaultfoeuille.
  • Cross-checking the Methodology Registry shows the relevant contract is the multiplier-weight law and PSU/unit-level perturbation formula, not the specific RNG engine: docs/methodology/REGISTRY.md:L342-L348, docs/methodology/REGISTRY.md:L1107-L1108, docs/methodology/REGISTRY.md:L1172-L1174, docs/methodology/REGISTRY.md:L2948-L2956.
  • The Rust-only fork is removed cleanly from the backend import surface and PyO3 registration, so the prior fixed-seed backend divergence no longer has a live alternate path: diff_diff/_backend.py:L18-L41, diff_diff/__init__.py:L17-L23, rust/src/lib.rs:L17-L25.
  • Added regression coverage is well targeted for this fix: helper-level distribution/moment/seed-pin checks and a guard against re-exporting the deleted Rust binding: tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L54-L74.
  • Review is code-inspection only; I could not run tests locally because this environment lacks numpy and pytest.

Methodology

  • Severity: P3-informational
    Impact: No methodology defect found. The helper still generates the documented Rademacher/Mammen/Webb multiplier distributions, and downstream estimators still perturb the same influence-function objects. Changing from Rust/Xoshiro to NumPy/default RNG is an implementation choice, not an undocumented deviation, missing assumption check, or variance/SE error. Relevant call sites remain diff_diff/staggered_bootstrap.py:L350-L352, diff_diff/continuous_did.py:L1464-L1466, diff_diff/imputation_bootstrap.py:L329-L330, diff_diff/two_stage_bootstrap.py:L321-L323, diff_diff/efficient_did_bootstrap.py:L126-L133, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L848-L872.
    Concrete fix: None.

Code Quality

  • Severity: P3-informational
    Impact: The cleanup is mechanically complete. I did not find stale imports, partial exports, or a hidden fallback that would preserve the old backend split: diff_diff/_backend.py:L18-L41, diff_diff/_backend.py:L44-L64, diff_diff/__init__.py:L17-L23, rust/src/lib.rs:L17-L25.
    Concrete fix: None.

Performance

  • Severity: P3-informational
    Impact: No correctness-adjacent performance bug is introduced. The diff intentionally removes a Rust-only accelerator for weight generation and keeps the downstream bootstrap algebra vectorized in NumPy; that is a deliberate tradeoff for backend-invariant seeded behavior, not an accidental hotspot: diff_diff/bootstrap_utils.py:L160-L218.
    Concrete fix: None.

Maintainability

  • Severity: P3-informational
    Impact: This reduces maintenance burden by deleting a backend-specific helper whose only responsibility was RNG-driven weight generation. The single implementation path plus the no-reexport guard materially lowers the chance of future silent backend drift: diff_diff/bootstrap_utils.py:L160-L218, tests/test_rust_backend.py:L54-L74.
    Concrete fix: None.

Tech Debt

  • Severity: P3-informational
    Impact: No new deferred debt is introduced. The change removes a backend-divergence class and adds regression coverage around the exact failure mode it is meant to prevent: tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L54-L74.
    Concrete fix: None.

Security

  • Severity: P3-informational
    Impact: No security issues identified in scope. If anything, the diff slightly reduces native surface area by deleting a PyO3-exported Rust function.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3
    Impact: diff_diff/bootstrap_utils.py:L169-L172 overstates the new contract by saying the helper runs under “numpy PCG64 RNG exclusively.” The function still accepts any np.random.Generator; only the library’s internal seeded call sites happen to use np.random.default_rng(...). This is a documentation precision issue, not a runtime defect.
    Concrete fix: Reword the docstring to say the library’s internal call sites are default_rng-based and that backend invariance is with respect to the supplied generator state, not specifically PCG64 for arbitrary external callers.

  • Severity: P3-informational
    Impact: The new regression coverage is appropriate for this change: alias/back-compat, distribution moments, fixed-seed byte pins, and a Rust-binding removal guard are all present in the diff: tests/test_bootstrap_utils.py:L284-L392, tests/test_rust_backend.py:L54-L74.
    Concrete fix: None.

The helper accepts any ``np.random.Generator``, not specifically PCG64;
library callers happen to seed via ``np.random.default_rng``.
Clarifies that backend invariance is with respect to the supplied
generator state, not a specific engine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d58fa6497558d2ea3540e21d233bdcaee4e68f3c


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • The changed methodology surface is the shared multiplier-bootstrap weight generator used by CallawaySantAnna, ContinuousDiD, ImputationDiD, TwoStageDiD, EfficientDiD, and ChaisemartinDHaultfoeuille: diff_diff/staggered_bootstrap.py:L329-L352, diff_diff/continuous_did.py:L1452-L1465, diff_diff/imputation_bootstrap.py:L309-L330, diff_diff/two_stage_bootstrap.py:L303-L323, diff_diff/efficient_did_bootstrap.py:L100-L133, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L847-L878.
  • docs/methodology/REGISTRY.md and the in-code bootstrap docstrings constrain the multiplier laws and clustering/PSU semantics, not the RNG engine. The NumPy-canonical change preserves the documented Rademacher/Mammen/Webb distributions and perturbation contracts, so this is an implementation choice, not a methodology deviation: docs/methodology/REGISTRY.md:L332-L360, docs/methodology/REGISTRY.md:L1107-L1108, docs/methodology/REGISTRY.md:L1172-L1174, docs/methodology/REGISTRY.md:L1859-L1863, docs/methodology/REGISTRY.md:L2948-L2956.
  • The divergent Rust path is removed cleanly from both Python and Rust surfaces: diff_diff/bootstrap_utils.py:L160-L219, diff_diff/_backend.py:L18-L23, diff_diff/_backend.py:L44-L48, diff_diff/_backend.py:L117-L123, diff_diff/__init__.py:L17-L23, rust/src/lib.rs:L17-L25, rust/Cargo.toml:L23-L28.
  • Re-review note: the prior P3 docstring precision nit is resolved; the helper now correctly says library call sites use np.random.default_rng(...), while any np.random.Generator is accepted: diff_diff/bootstrap_utils.py:L166-L173.
  • Regression coverage is well targeted: alias/back-compat, invalid-type handling, distribution/moment checks, fixed-seed pins, and a guard against re-exporting the deleted Rust binding: tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L46-L74.
  • Review is code-inspection only; I could not rerun the tests locally because this environment lacks numpy.

Methodology

  • Severity: P3-informational. Impact: No methodology defect found. The PR does not alter estimator equations, identification assumptions, weighting formulas, or variance semantics; it only removes a backend-specific RNG implementation while keeping the documented multiplier distributions and unit/group/PSU perturbation contracts intact. Concrete fix: None. docs/methodology/REGISTRY.md:L332-L360, docs/methodology/REGISTRY.md:L1107-L1108, docs/methodology/REGISTRY.md:L1172-L1174, docs/methodology/REGISTRY.md:L2948-L2956, diff_diff/imputation_bootstrap.py:L284-L294, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L141-L155

Code Quality

  • Severity: P3-informational. Impact: Cleanup is mechanically complete; I did not find stale imports, partial exports, or a hidden fallback that would preserve the old divergent Rust path. Concrete fix: None. diff_diff/_backend.py:L18-L23, diff_diff/_backend.py:L44-L48, diff_diff/_backend.py:L117-L123, diff_diff/__init__.py:L17-L23, rust/src/lib.rs:L17-L25

Performance

  • Severity: P3-informational. Impact: The PR trades a small Rust-only accelerator for backend-invariant seeded behavior. That may modestly slow very wide bootstraps, but no correctness-adjacent performance bug is introduced in the changed logic. Concrete fix: None. CHANGELOG.md:L30-L30, diff_diff/bootstrap_utils.py:L191-L210

Maintainability

  • Severity: P3-informational. Impact: One canonical implementation path plus the no-reexport regression guard materially reduces future backend drift risk. Concrete fix: None. diff_diff/bootstrap_utils.py:L217-L219, tests/test_rust_backend.py:L54-L74

Tech Debt

  • Severity: P3-informational. Impact: This PR closes tracked debt rather than adding new deferred work; the changelog explicitly records closure of the multiplier-RNG audit item. Concrete fix: None. CHANGELOG.md:L30-L30

Security

  • Severity: P3-informational. Impact: No security issues identified in scope. The native PyO3 surface is slightly smaller after deleting the bootstrap binding. Concrete fix: None. rust/src/lib.rs:L21-L25

Documentation/Tests

  • Severity: P3-informational. Impact: Coverage is appropriate for the change, and the previously noted docstring overstatement is fixed. Concrete fix: None. diff_diff/bootstrap_utils.py:L166-L173, tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L46-L74

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026
…value

The PR's RNG unification is working as intended — Rust now produces the
same bootstrap SE as pure-Python under the same seed — but the pinned
`TestBootstrapCellPeriod::test_bootstrap_se_matches_pre_pr4_baseline`
guard held two separate per-backend baselines from the pre-RNG-unified
era. The Rust backend now produces exactly the value that was formerly
pinned as the Python-only baseline (0.3030802540369796), so the split
is obsolete. Collapse the two constants into one, drop the HAS_RUST_BACKEND
branch + os env-var check, and document the convergence in the comment.

The regression guard's semantic is unchanged: under PSU=group the
dispatcher still routes through the legacy group-level bootstrap path,
and the assertion is still ULP-precision bit-identity against the
captured baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9af7199787dcd2ba578bbf75015dd992e46b52ef


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • The changed methodology surface is the shared multiplier-bootstrap weight generator used by CallawaySantAnna, ContinuousDiD, ImputationDiD, TwoStageDiD, EfficientDiD, and ChaisemartinDHaultfoeuille.
  • Cross-checking the registry and in-code references shows the contract is the multiplier law and the perturbation target (unit/cluster/PSU IF sums), not the RNG engine; switching from Rust Xoshiro to NumPy PCG64 is an implementation choice, not a methodology deviation.
  • The cleanup is mechanically complete: the deleted Rust helper is removed from Python exports, the PyO3 module surface, and Cargo dependencies, while the changed callers still consume the same (n_bootstrap, n_units, weight_type, rng) interface.
  • Re-review scope: I did not find any new P1+ issue in the updated diff.
  • One minor documentation drift remains: some dCDH registry/docstring text still says the PSU fast path is bit-identical to pre-cell-level releases, which is no longer literally true for historical Rust fixed-seed output after this RNG canonicalization.
  • Review is code-inspection only; I could not execute the touched tests locally because this environment lacks pytest and numpy.

Methodology

  • Severity: P3-informational. Impact: No methodology defect found. The changed code only canonicalizes RNG for multiplier weights; it does not change estimator equations, identification assumptions, control-group definitions, weighting formulas, or inference-field handling. The affected source material and registry entries constrain the bootstrap distributions and IF/cluster/PSU perturbation contracts, all of which are preserved here. Concrete fix: None. diff_diff/bootstrap_utils.py:L160-L219, diff_diff/staggered_bootstrap.py:L155-L160, diff_diff/continuous_did.py:L1-L9, diff_diff/imputation.py:L1-L15, diff_diff/two_stage.py:L1-L14, diff_diff/efficient_did.py:L1-L13, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L1-L17, docs/methodology/REGISTRY.md:L332-L348, docs/methodology/REGISTRY.md:L586-L606, docs/methodology/REGISTRY.md:L843-L852, docs/methodology/REGISTRY.md:L1102-L1108, docs/methodology/REGISTRY.md:L1172-L1174, docs/methodology/REGISTRY.md:L2948-L2956

Code Quality

  • Severity: P3-informational. Impact: The removal is internally consistent. The deleted Rust symbol is gone from backend import wiring, package re-exports, the Rust module registration, and Cargo deps; the changed files do not leave a stale fallback or partial export behind. No new inline-inference or partial-NaN anti-pattern was introduced in scope. Concrete fix: None. diff_diff/_backend.py:L18-L23, diff_diff/_backend.py:L44-L48, diff_diff/_backend.py:L67-L88, diff_diff/_backend.py:L117-L138, diff_diff/__init__.py:L23-L29, rust/src/lib.rs:L17-L25, rust/Cargo.toml:L23-L28

Performance

  • Severity: P3-informational. Impact: Deleting the Rust-only weight generator may slightly slow very wide bootstraps, but the asymptotic estimator work is unchanged and the tradeoff is explicit: one canonical seeded path instead of backend drift. I do not see a correctness-adjacent performance regression in the changed logic. Concrete fix: None; if a later benchmark shows material slowdown, the safe shape is to pre-generate NumPy weights once and pass them into Rust consumers, not to reintroduce a second RNG implementation. diff_diff/bootstrap_utils.py:L160-L219, CHANGELOG.md:L31-L31

Maintainability

  • Severity: P3-informational. Impact: This substantially reduces future backend skew risk by collapsing to one implementation and adding a regression guard that prevents accidental re-export of the deleted Rust binding. Concrete fix: None. diff_diff/bootstrap_utils.py:L217-L219, tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L46-L74

Tech Debt

  • Severity: P3-informational. Impact: In scope, this is debt payoff rather than new debt. The diff deletes a divergent backend path instead of adding another tracked limitation. Concrete fix: None. diff_diff/bootstrap_utils.py:L160-L219, rust/src/lib.rs:L17-L25, CHANGELOG.md:L31-L31

Security

  • Severity: P3-informational. Impact: No security issue found. The native extension surface is slightly smaller after removing one PyO3 binding and two RNG-related Rust dependencies. Concrete fix: None. rust/src/lib.rs:L17-L25, rust/Cargo.toml:L23-L28

Documentation/Tests

  • Severity: P3-informational. Impact: Minor wording drift remains in dCDH methodology docs/docstrings. They still describe the PSU-within-group-constant fast path as bit-identical to pre-cell-level releases, but after this PR that statement is only safely true relative to the current canonical NumPy/group-level path, not historical Rust fixed-seed output. Concrete fix: Clarify the wording in docs/methodology/REGISTRY.md and the dCDH bootstrap docstrings to scope the bit-identity claim to the canonical current path. docs/methodology/REGISTRY.md:L604-L606, docs/methodology/REGISTRY.md:L673-L677, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L30-L35, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L186-L190, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L649-L655, tests/test_survey_dcdh.py:L1816-L1884
  • Severity: P3-informational. Impact: The changed regression coverage is otherwise well targeted: distribution/moment checks, byte-level seed pins for the canonical helper, back-compat alias coverage, a removed-binding guard, and an updated dCDH pinned baseline. Concrete fix: None. tests/test_bootstrap_utils.py:L274-L392, tests/test_rust_backend.py:L54-L74, tests/test_survey_dcdh.py:L1859-L1884

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

Closing without merging. On reflection, the correctness argument is really about cross-backend seed parity, not correctness per se — each backend independently produces valid draws from the specified multiplier distribution. The measured weight-generation slowdown is meaningful (numpy PCG64 vs Rust Xoshiro+rayon):

size n_boot n_units dist Rust ms numpy ms ratio
medium 999 1000 rademacher 0.40 3.44 8.7×
medium 999 1000 mammen 0.36 10.77 29.9×
large 999 10000 mammen 2.79 107.87 38.6×
stress 5000 1000 mammen 1.58 53.76 33.9×

At typical practitioner scales (n_units ≤ 1000) the absolute delta is 3–10ms per .fit() — small in the context of a multi-second bootstrap — but the delta grows with n_units (Mammen at n_units=10000 adds ~105ms per fit). Not worth paying for same-seed cross-backend reproducibility.

If a future session revisits this, the more promising path is a Rust helper that accepts Python-generated uniform bytes (not weights) and does the bucket/sign transform in parallel — preserves backend invariance without giving up rayon throughput.

@igerber igerber closed this Apr 24, 2026
@igerber igerber deleted the fix/bootstrap-weight-rng-parity branch April 24, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant